{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook Description\n",
    "\n",
    "Often we want to compare if two files with seizure annotations contain the same annotations. For example, if you look through a week of recordings and annotate the sezures, comparing a classifier's predictions with your annotations will allow you to check the number of false positives and (more importantly) false negatives. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For an example we will look at the difference between a raw predictions output and the same file after it has been manually checked and the false postives removed using the gui. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "checked_preds = pd.read_csv('./example_data/checked_predictions.csv', index_col=0)\n",
    "raw_preds     = pd.read_csv('./example_data/raw_predictions.csv', index_col=0)\n",
    "\n",
    "checked_preds = pd.read_csv('/media/jonathan/My Passport-ELE -EEG/2019/analysis pyecog/01b_baseline_annotation_only.csv',\n",
    "                             index_col=0)\n",
    "raw_preds     = pd.read_csv('/media/jonathan/My Passport-ELE -EEG/2019/analysis pyecog/prediction from Tawfeeq Library/predictions_baselineconverted01.csv',\n",
    "                            index_col=0)\n",
    "# \n",
    "path= r'nathan\\My Passport-ELE -EEG\\2019/analysis pyecog/prediction from Ta'\n",
    "raw_preds = pd.read_csv(path, index_col=0)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5\n",
      "10\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "10"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = 5\n",
    "print(x)\n",
    "x = 10\n",
    "print(x)\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# note to jonny 2019 :\n",
    "\n",
    "The file output by the clf does not havethe [] areound the transmitter. Whereas "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>old_index</th>\n",
       "      <th>filename</th>\n",
       "      <th>start</th>\n",
       "      <th>end</th>\n",
       "      <th>duration</th>\n",
       "      <th>transmitter</th>\n",
       "      <th>real_start</th>\n",
       "      <th>real_end</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>M1566408624_2019-08-21-18-30-24_tids_[142].h5</td>\n",
       "      <td>125.0</td>\n",
       "      <td>285.0</td>\n",
       "      <td>160.0</td>\n",
       "      <td>[142]</td>\n",
       "      <td>2019-08-21 18:32:29</td>\n",
       "      <td>2019-08-21 18:35:09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>1965.0</td>\n",
       "      <td>2025.0</td>\n",
       "      <td>60.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:03:09</td>\n",
       "      <td>2019-08-22 09:04:09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>2125.0</td>\n",
       "      <td>2195.0</td>\n",
       "      <td>70.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:05:49</td>\n",
       "      <td>2019-08-22 09:06:59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>2290.0</td>\n",
       "      <td>2305.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:08:34</td>\n",
       "      <td>2019-08-22 09:08:49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>2380.0</td>\n",
       "      <td>2425.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:10:04</td>\n",
       "      <td>2019-08-22 09:10:49</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   old_index                                       filename   start     end  \\\n",
       "0          0  M1566408624_2019-08-21-18-30-24_tids_[142].h5   125.0   285.0   \n",
       "1          1  M1566459024_2019-08-22-08-30-24_tids_[119].h5  1965.0  2025.0   \n",
       "2          2  M1566459024_2019-08-22-08-30-24_tids_[119].h5  2125.0  2195.0   \n",
       "3          3  M1566459024_2019-08-22-08-30-24_tids_[119].h5  2290.0  2305.0   \n",
       "4          4  M1566459024_2019-08-22-08-30-24_tids_[119].h5  2380.0  2425.0   \n",
       "\n",
       "   duration transmitter           real_start             real_end  \n",
       "0     160.0       [142]  2019-08-21 18:32:29  2019-08-21 18:35:09  \n",
       "1      60.0       [119]  2019-08-22 09:03:09  2019-08-22 09:04:09  \n",
       "2      70.0       [119]  2019-08-22 09:05:49  2019-08-22 09:06:59  \n",
       "3      15.0       [119]  2019-08-22 09:08:34  2019-08-22 09:08:49  \n",
       "4      45.0       [119]  2019-08-22 09:10:04  2019-08-22 09:10:49  "
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "raw_preds.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(112, 8)"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "raw_preds.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(77, 8)"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "checked_preds.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>old_index</th>\n",
       "      <th>filename</th>\n",
       "      <th>start</th>\n",
       "      <th>end</th>\n",
       "      <th>duration</th>\n",
       "      <th>transmitter</th>\n",
       "      <th>real_start</th>\n",
       "      <th>real_end</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>414</td>\n",
       "      <td>M1567629024_2019-09-04-21-30-24_tids_[28, 33, ...</td>\n",
       "      <td>860.18</td>\n",
       "      <td>939.98</td>\n",
       "      <td>79.80</td>\n",
       "      <td>[119]</td>\n",
       "      <td>04/09/2019 21:44</td>\n",
       "      <td>04/09/2019 21:46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>407</td>\n",
       "      <td>M1567611024_2019-09-04-16-30-24_tids_[28, 33, ...</td>\n",
       "      <td>2976.58</td>\n",
       "      <td>3031.56</td>\n",
       "      <td>54.98</td>\n",
       "      <td>[119]</td>\n",
       "      <td>04/09/2019 17:20</td>\n",
       "      <td>04/09/2019 17:20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>380</td>\n",
       "      <td>M1567524624_2019-09-03-16-30-24_tids_[28, 33, ...</td>\n",
       "      <td>26.26</td>\n",
       "      <td>68.79</td>\n",
       "      <td>42.53</td>\n",
       "      <td>[119]</td>\n",
       "      <td>03/09/2019 16:30</td>\n",
       "      <td>03/09/2019 16:31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>377</td>\n",
       "      <td>M1567521024_2019-09-03-15-30-24_tids_[28, 33, ...</td>\n",
       "      <td>3298.90</td>\n",
       "      <td>3347.56</td>\n",
       "      <td>48.66</td>\n",
       "      <td>[119]</td>\n",
       "      <td>03/09/2019 16:25</td>\n",
       "      <td>03/09/2019 16:26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>376</td>\n",
       "      <td>M1567521024_2019-09-03-15-30-24_tids_[28, 33, ...</td>\n",
       "      <td>2996.26</td>\n",
       "      <td>3057.73</td>\n",
       "      <td>61.47</td>\n",
       "      <td>[119]</td>\n",
       "      <td>03/09/2019 16:20</td>\n",
       "      <td>03/09/2019 16:21</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    old_index                                           filename    start  \\\n",
       "2         414  M1567629024_2019-09-04-21-30-24_tids_[28, 33, ...   860.18   \n",
       "4         407  M1567611024_2019-09-04-16-30-24_tids_[28, 33, ...  2976.58   \n",
       "7         380  M1567524624_2019-09-03-16-30-24_tids_[28, 33, ...    26.26   \n",
       "9         377  M1567521024_2019-09-03-15-30-24_tids_[28, 33, ...  3298.90   \n",
       "10        376  M1567521024_2019-09-03-15-30-24_tids_[28, 33, ...  2996.26   \n",
       "\n",
       "        end  duration transmitter        real_start          real_end  \n",
       "2    939.98     79.80       [119]  04/09/2019 21:44  04/09/2019 21:46  \n",
       "4   3031.56     54.98       [119]  04/09/2019 17:20  04/09/2019 17:20  \n",
       "7     68.79     42.53       [119]  03/09/2019 16:30  03/09/2019 16:31  \n",
       "9   3347.56     48.66       [119]  03/09/2019 16:25  03/09/2019 16:26  \n",
       "10  3057.73     61.47       [119]  03/09/2019 16:20  03/09/2019 16:21  "
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "checked_preds.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "So we expect there to be 35 false positives\n"
     ]
    }
   ],
   "source": [
    "print('So we expect there to be', raw_preds.shape[0]-checked_preds.shape[0], 'false positives')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Code for comparing the dataframes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_mcode_tid_col(df):\n",
    "    '''Note this expects file start to be of format: M1513966209'''\n",
    "    df['mcode_tid'] = df.filename.str.slice(0,11)+'_'+df.transmitter.astype(str)\n",
    "    return df\n",
    "\n",
    "def check_overlap(series1,series2):\n",
    "    ''' pandas series should both have start and end columns\n",
    "    http://baodad.blogspot.co.uk/2014/06/date-range-overlap.html\n",
    "    '''\n",
    "    start_a, end_a = float(series1.start), float(series1.end)\n",
    "    start_b, end_b = float(series2.start), float(series2.end)\n",
    "    overlap_bool = (start_a <= end_b) and (end_a>=start_b)\n",
    "    return overlap_bool\n",
    "\n",
    "def calculate_overlap(series1,series2):\n",
    "    ''' pandas series should both have start and end attrs\n",
    "    http://baodad.blogspot.co.uk/2014/06/date-range-overlap.html\n",
    "    '''\n",
    "    a, b = float(series1.start), float(series1.end)\n",
    "    c, d = float(series2.start), float(series2.end)\n",
    "    overlap = min([b-a,b-c,d-c,d-a])\n",
    "    return overlap\n",
    "\n",
    "def compare_dfs(prediction_df, annotation_df):\n",
    "    ''' \n",
    "    Function to check how much of prediction_df is found in annotation_df\n",
    "    \n",
    "    Returns two dataframes:\n",
    "        - preds_in_annotations_df: the predictions found within the annotations and the amount of overlap\n",
    "          (probably corresponding to true positives). Has an overlap column (seconds).\n",
    "        - preds_not_in_annotations_df: the predictions not found in the annotations\n",
    "          (probably corresponding to false positives)\n",
    "          \n",
    "    Note:\n",
    "        To check for false negatives, or missed seizures, pass in the actual annotations\n",
    "        as the 'prediction_df' and the actual predictions as the 'annotation_df'. \n",
    "        Here we would hope for no 'false positives', so the second dataframe will contain the \n",
    "        missed seizures...\n",
    "    '''\n",
    "    \n",
    "    # first add a 'mcode_tid' col: allows us to check for same hour and transmitter\n",
    "    prediction_df = add_mcode_tid_col(prediction_df)\n",
    "    annotation_df = add_mcode_tid_col(annotation_df)\n",
    "    \n",
    "    # Create empty dataframes that we will add to below\n",
    "    preds_in_annotations_df     = pd.DataFrame(columns = prediction_df.columns)\n",
    "    preds_not_in_annotations_df = pd.DataFrame(columns = prediction_df.columns)\n",
    "    \n",
    "    # loop over the predictions\n",
    "    for _, prediction_row_series in prediction_df.iterrows():\n",
    "        overlap_bool = False # boolean for if the predicted seizure at all overlaps with an annotation\n",
    "        # first check if the hour&transmitter is in the annotations\n",
    "        if prediction_row_series.mcode_tid in annotation_df.mcode_tid.unique():\n",
    "            \n",
    "            # next find all annotations with same hour and tid as the prediction row\n",
    "            # this will often just be one row, but if >1 seizures in a single hour will be more\n",
    "            revevant_annotations_df = annotation_df[annotation_df.mcode_tid.isin([prediction_row_series.mcode_tid])]\n",
    "            \n",
    "            # finally check if the start and end columns overlap\n",
    "            t_overlap = 0 # store the overlap time between preds and seizures\n",
    "            for _, annotation_row_series in revevant_annotations_df.iterrows(): \n",
    "                row_overlap   =  check_overlap(prediction_row_series,\n",
    "                                               annotation_row_series) # in the case that two seizures, want to add...\n",
    "                overlap_bool  += row_overlap\n",
    "                if row_overlap: # is this robust to two seizures?\n",
    "                    t_overlap += calculate_overlap(prediction_row_series,\n",
    "                                                   annotation_row_series)\n",
    "                \n",
    "        if overlap_bool>0:\n",
    "            prediction_row_series['overlap'] = t_overlap\n",
    "            preds_in_annotations_df   = preds_in_annotations_df.append(prediction_row_series)\n",
    "        else:\n",
    "            preds_not_in_annotations_df = preds_not_in_annotations_df.append(prediction_row_series)\n",
    "    \n",
    "    return preds_in_annotations_df, preds_not_in_annotations_df\n",
    "\n",
    "true_positives, false_positives = compare_dfs(raw_preds,checked_preds)     "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((67, 10), (45, 9))"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "true_positives.shape, false_positives.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>old_index</th>\n",
       "      <th>filename</th>\n",
       "      <th>start</th>\n",
       "      <th>end</th>\n",
       "      <th>duration</th>\n",
       "      <th>transmitter</th>\n",
       "      <th>real_start</th>\n",
       "      <th>real_end</th>\n",
       "      <th>mcode_tid</th>\n",
       "      <th>overlap</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>1965.0</td>\n",
       "      <td>2025.0</td>\n",
       "      <td>60.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:03:09</td>\n",
       "      <td>2019-08-22 09:04:09</td>\n",
       "      <td>M1566459024_[119]</td>\n",
       "      <td>58.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>2125.0</td>\n",
       "      <td>2195.0</td>\n",
       "      <td>70.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:05:49</td>\n",
       "      <td>2019-08-22 09:06:59</td>\n",
       "      <td>M1566459024_[119]</td>\n",
       "      <td>39.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>2290.0</td>\n",
       "      <td>2305.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:08:34</td>\n",
       "      <td>2019-08-22 09:08:49</td>\n",
       "      <td>M1566459024_[119]</td>\n",
       "      <td>15.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>2380.0</td>\n",
       "      <td>2425.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:10:04</td>\n",
       "      <td>2019-08-22 09:10:49</td>\n",
       "      <td>M1566459024_[119]</td>\n",
       "      <td>43.08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5.0</td>\n",
       "      <td>M1566459024_2019-08-22-08-30-24_tids_[119].h5</td>\n",
       "      <td>2585.0</td>\n",
       "      <td>2635.0</td>\n",
       "      <td>50.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-22 09:13:29</td>\n",
       "      <td>2019-08-22 09:14:19</td>\n",
       "      <td>M1566459024_[119]</td>\n",
       "      <td>50.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   old_index                                       filename   start     end  \\\n",
       "1        1.0  M1566459024_2019-08-22-08-30-24_tids_[119].h5  1965.0  2025.0   \n",
       "2        2.0  M1566459024_2019-08-22-08-30-24_tids_[119].h5  2125.0  2195.0   \n",
       "3        3.0  M1566459024_2019-08-22-08-30-24_tids_[119].h5  2290.0  2305.0   \n",
       "4        4.0  M1566459024_2019-08-22-08-30-24_tids_[119].h5  2380.0  2425.0   \n",
       "5        5.0  M1566459024_2019-08-22-08-30-24_tids_[119].h5  2585.0  2635.0   \n",
       "\n",
       "   duration transmitter           real_start             real_end  \\\n",
       "1      60.0       [119]  2019-08-22 09:03:09  2019-08-22 09:04:09   \n",
       "2      70.0       [119]  2019-08-22 09:05:49  2019-08-22 09:06:59   \n",
       "3      15.0       [119]  2019-08-22 09:08:34  2019-08-22 09:08:49   \n",
       "4      45.0       [119]  2019-08-22 09:10:04  2019-08-22 09:10:49   \n",
       "5      50.0       [119]  2019-08-22 09:13:29  2019-08-22 09:14:19   \n",
       "\n",
       "           mcode_tid  overlap  \n",
       "1  M1566459024_[119]    58.43  \n",
       "2  M1566459024_[119]    39.32  \n",
       "3  M1566459024_[119]    15.00  \n",
       "4  M1566459024_[119]    43.08  \n",
       "5  M1566459024_[119]    50.00  "
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "true_positives.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>old_index</th>\n",
       "      <th>filename</th>\n",
       "      <th>start</th>\n",
       "      <th>end</th>\n",
       "      <th>duration</th>\n",
       "      <th>transmitter</th>\n",
       "      <th>real_start</th>\n",
       "      <th>real_end</th>\n",
       "      <th>mcode_tid</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>M1566408624_2019-08-21-18-30-24_tids_[142].h5</td>\n",
       "      <td>125.0</td>\n",
       "      <td>285.0</td>\n",
       "      <td>160.0</td>\n",
       "      <td>[142]</td>\n",
       "      <td>2019-08-21 18:32:29</td>\n",
       "      <td>2019-08-21 18:35:09</td>\n",
       "      <td>M1566408624_[142]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>21.0</td>\n",
       "      <td>M1566765024_2019-08-25-21-30-24_tids_[119].h5</td>\n",
       "      <td>875.0</td>\n",
       "      <td>950.0</td>\n",
       "      <td>75.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-25 21:44:59</td>\n",
       "      <td>2019-08-25 21:46:14</td>\n",
       "      <td>M1566765024_[119]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>23.0</td>\n",
       "      <td>M1566822624_2019-08-26-13-30-24_tids_[119].h5</td>\n",
       "      <td>2545.0</td>\n",
       "      <td>2625.0</td>\n",
       "      <td>80.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-26 14:12:49</td>\n",
       "      <td>2019-08-26 14:14:09</td>\n",
       "      <td>M1566822624_[119]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>24.0</td>\n",
       "      <td>M1566822624_2019-08-26-13-30-24_tids_[119].h5</td>\n",
       "      <td>2715.0</td>\n",
       "      <td>2760.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-26 14:15:39</td>\n",
       "      <td>2019-08-26 14:16:24</td>\n",
       "      <td>M1566822624_[119]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>25.0</td>\n",
       "      <td>M1566822624_2019-08-26-13-30-24_tids_[119].h5</td>\n",
       "      <td>2870.0</td>\n",
       "      <td>2910.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>[119]</td>\n",
       "      <td>2019-08-26 14:18:14</td>\n",
       "      <td>2019-08-26 14:18:54</td>\n",
       "      <td>M1566822624_[119]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    old_index                                       filename   start     end  \\\n",
       "0         0.0  M1566408624_2019-08-21-18-30-24_tids_[142].h5   125.0   285.0   \n",
       "21       21.0  M1566765024_2019-08-25-21-30-24_tids_[119].h5   875.0   950.0   \n",
       "23       23.0  M1566822624_2019-08-26-13-30-24_tids_[119].h5  2545.0  2625.0   \n",
       "24       24.0  M1566822624_2019-08-26-13-30-24_tids_[119].h5  2715.0  2760.0   \n",
       "25       25.0  M1566822624_2019-08-26-13-30-24_tids_[119].h5  2870.0  2910.0   \n",
       "\n",
       "    duration transmitter           real_start             real_end  \\\n",
       "0      160.0       [142]  2019-08-21 18:32:29  2019-08-21 18:35:09   \n",
       "21      75.0       [119]  2019-08-25 21:44:59  2019-08-25 21:46:14   \n",
       "23      80.0       [119]  2019-08-26 14:12:49  2019-08-26 14:14:09   \n",
       "24      45.0       [119]  2019-08-26 14:15:39  2019-08-26 14:16:24   \n",
       "25      40.0       [119]  2019-08-26 14:18:14  2019-08-26 14:18:54   \n",
       "\n",
       "            mcode_tid  \n",
       "0   M1566408624_[142]  \n",
       "21  M1566765024_[119]  \n",
       "23  M1566822624_[119]  \n",
       "24  M1566822624_[119]  \n",
       "25  M1566822624_[119]  "
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "false_positives.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Here save the false positives to check through using the gui\n",
    "\n",
    "- they might not all be false positives!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "savename = 'predictions_not_in_annotations.csv'\n",
    "false_positives.to_csv(savename,header=False, index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Here flip the order of dataframes:\n",
    "\n",
    "```\n",
    "Note:\n",
    "        To check for false negatives, or missed seizures, pass in the actual annotations\n",
    "        as the 'prediction_df' and the actual predictions as the 'annotation_df'. \n",
    "        Here we would hope for no 'false positives', so the second dataframe will contain the \n",
    "        missed seizures...\n",
    "```\n",
    "\n",
    "Pass in checked preds as the predctions. This is similar to the case where you are checking for missed seizures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "true_positives, false_positives = compare_dfs(checked_preds,raw_preds)     "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(69, 10)"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "true_positives.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "false_positives.shape\n",
    "savename = 'annotations_not_in_predictions.csv'\n",
    "false_positives.to_csv(savename,header=True, index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "as expected no false positives. If there were these would be annoations that had been missed (if the predictions were over the same time period as the annotations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(8, 9)"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "false_positives.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}